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We consider Gibbs and block Gibbs samplers for a Bayesian hi- 
^sj . erarchical version of the one-way random effects model. Drift and 

minorization conditions are established for the underlying Markov 
chains. The drift and minorization are used in conjunction with re- 
r~' ■ suits from J. S. Rosenthal [J. Amer. Statist. Assoc. 90 (1995) 558- 

C/j ' 566] and G. O. Roberts and R. L. Tweedie [Stochastic Process. Appl. 

(~| ■ 80 (1999) 211-229] to construct analytical upper bounds on the dis- 

tance to stationarity. These lead to upper bounds on the amount of 
burn-in that is required to get the chain within a prespecified (to- 
tal variation) distance of the stationary distribution. The results are 
illustrated with a numerical example. 
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T-j- I 1. Introduction. We consider a Bayesian hierarchical version of the stan- 

VO ■ dard normal theory one-way random effects model. The posterior density for 

this model is intractable in the sense that the integrals required for mak- 

(•~^ I ing inferences cannot be computed in closed form. Hobert and Geyer (1998) 

^" ■ analyzed a Gibbs sampler and a block Gibbs sampler for this problem and 

showed that the Markov chains underlying these algorithms converge to 

1 -^ I the stationary (i.e., posterior) distribution at a geometric rate. However, 

C^ • Hobert and Geyer stopped short of constructing analytical upper bounds 

^ . on the total variation distance to stationarity. In this article, we construct 

^ I such upper bounds and this leads to a method for determining a sufficient 

burn-in. 

Our results are useful from a practical standpoint because they obviate 



X 



H 



C^ I troublesome, ad hoc convergence diagnostics [Cowles and Carlin (1996) and 
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2 G. L. JONES AND J. P. HOBERT 

Cowles, Roberts and Rosenthal (1999)]. More important, however, we be- 
heve that this is the first analysis of a practically relevant Gibbs sampler 
on a continuous state space that provides viable burn-ins. By practically 
relevant, we mean that the stationary distribution is complex enough that 
independent and identically distributed (i.i.d.) sampling is not straightfor- 
ward. We note that the Gibbs samplers analyzed by Hobert (2001) and 
Rosenthal (1995a, 1996) are not practically relevant since i.i.d. samples can 
be drawn from the corresponding stationary distributions using simple, se- 
quential sampling schemes [Jones (2001) and Marchev and Hobert (2004)]. 
Some notation is now introduced that will allow for a more detailed overview. 
Let X = {Xi, i = 0, 1, . . . } be a discrete time, time homogeneous Markov 
chain that is irreducible, aperiodic and positive Harris recurrent. Let P"(a;, •) 
be the probability measure corresponding to the random variable X„ con- 
ditional on starting the chain at Xq = x; that is, P" is the n-step Markov 
transition kernel. Let '7r(-) be the invariant probability measure of the chain 
and let || • || denote the total variation norm. Formally, the issue of burn-in 
can be described as follows. Given a starting value xq and an arbitrary e > 0, 
can we find an n* = n*(xo,e) such that 

(1) \\P^\xo,-)-7ri-)\\<e? 

If the answer is "yes," then, since the left-hand side of (1) is nonincreasing in 
the number of iterations, the distribution of X^ is within e of vr for all k>n*. 
Because we are not demanding that n* be the smallest value for which (1) 
holds, it is possible that the chain actually gets within e of stationarity 
in much fewer than n* iterations. For this reason, we call n* a sufficient 
burn-in. 

Several authors [see, e.g., Meyn and Tweedie (1994), Rosenthal (1995a), 
Cowles and Rosenthal (1998), Roberts and Tweedie (1999) and Douc, Moulines and Rosenthal 
(2002)] have recently provided results that allow one to calculate n* when 
X is geometrically ergodic. However, to use these results one must establish 
both a drift condition and an associated minorization condition for X. [For 
an accessible treatment of these concepts, see Jones and Hobert (2001).] In 
this article we establish drift and minorization for the Gibbs samplers ana- 
lyzed by Hobert and Geyer (1998). These conditions are used in conjunction 
with the theorems of Rosenthal (1995a) and Roberts and Tweedie (1999) to 
construct formulas that can be used to calculate n*. 

The rest of the article is organized as follows. The model and algorithms 
are described in Section 2. Section 3 contains important background ma- 
terial on general state space Markov chain theory as well as statements of 
the theorems of Rosenthal (1995a) and Roberts and Tweedie (1999). This 
section also contains a new conversion lemma that provides a connection 
between the two different types of drift used in these theorems. We estab- 
lish drift and minorization for the block Gibbs sampler in Section 4 and the 
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same is done for the Gibbs sampler in Section 5. In Section 6 the resuhs 
are illustrated and Rosenthal's theorem is compared with the theorem of 
Roberts and Tweedie. Section 7 contains some concluding remarks. 

2. The model and the Gibbs samplers. Consider the following Bayesian 
version of the standard normal theory one-way random effects model. First, 
conditional on 6 = {9i,. . . , Ok) and Ag the data Yij are independent with 

where i = 1, . . . , X and j = 1, . . . , mj. At the second stage, conditional on /i 
and Ag , 6i,. . . ,6k and Ag are independent with 

9i\fi,Xo '-^'N{i^i,Xg ) and Ag ~ Gamma(a2,62)5 

where 02 and 62 are known positive constants. [We say W ~ Gamma(a,/9) if 
its density is proportional to w°^~^e~^f^I{w > 0).] Finally, at the third stage 
11 and Xe are assumed independent with 

/i ~ N(?7io, s^^) and Ag ~ Gamma(ai, 61), 

where mQ,SQ,ai and hi are known constants; all but mo are assumed to be 
positive so that all of the priors are proper. The posterior density of this 
hierarchical model is characterized by 

(2) ^h{e, ^t, X\y) oc fiy\e, A,)/(0|m, Xe) f (X^) f (fi) f (Xe) , 

where A = (A^, Ae)"'", y is a vector containing all of the data, and / denotes 
a generic density. [We will often abuse notation and use vr/i to denote the 
probability distribution associated with the density in (2).] Expectations 
with respect to tt^ are typically ratios of intractable integrals, the numera- 
tors of which can have dimension as high as K + 3 [Jones and Hobert (2001)]. 
Thus, to make inferences using vr/i, we must resort to (possibly) high dimen- 
sional numerical integration, analytical approximations or Monte Carlo and 
Markov chain Monte Carlo techniques. 

In their seminal article on the Gibbs sampler, Gelfand and Smith (1990) 
used the balanced version of this model (in which rrii = m) as an exam- 
ple. [See also Gelfand, Hills, Racine-Poon and Smith (1990) and Rosenthal 
(1995b).] Each iteration of the standard, fixed-scan Gibbs sampler consists 
of updating all of the K + 3 variables in the same predetermined order. The 
full conditionals required for this Gibbs sampler are now reported. Define 

K 

i=l 
K 



V2{0) = ^mi{di - yif and SSE = ^(yij-yj 
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where yi = m^ J2T=iyij- The fuh conditionals for the variance components 
are 

(3) A6i|6',/i,Ae,y~Gammaf — - +oi, -^ h 61 

and 

(4) Ae|6',/i,Ae,y~Gammal — + 02, h 62 

where M = X]i w-i- Letting 0„j = (^1, . . . ,6'j_i,0j+i, . . . ,6k)^ and 9 = K~^ x 
J2i(^ii the remaining fuh conditionals are 

6ii 6'_i,/i,A6),Ae,y~N — — ■ — ,- — ■ - 

V Ag + TTljAe Ag + mi\ 

for i = l, . . . ,K and 

\o \ \ ,-fsomo + KXge 1 

fi9, Xe, Ae, y ~ N — , — — 

V So + ^Ae So + KXe 

We consider the fixed-scan Gibbs sampler that updates /x, then the ^j's, 
then Ag and Xe- Since the 6*4 's are conditionally independent given (/i, A), the 
order in which they are updated is irrelevant. The same is true of Xg and Ae 
since these two random variables are conditionally independent given {9,fj,). 
If we write a one-step transition as {fi',6',X') -^ {fi,6,X), then the Markov 
transition density (MTD) of our Gibbs sampler is given by 



k{f,,e,x\fi',e',x') = f{fi\e',x'g,x',,y) 



K 



l[f{ei\e^i,i^,x'g,x'e,y) 



i=l 



X f{Xg\e, /_i, A'e, y)f{Xe\6, /_i, Ae, y). 

Hobert and Geyer (1998) considered this same update order. We note here 
that, in general, Gibbs samplers with different update orders correspond to 
different Markov chains. However, two chains whose update orders are cyclic 
permutations of one another converge at the same rate. 

As an alternative to the standard Gibbs sampler, Hobert and Geyer (1998) 
introduced the more efficient block Gibbs sampler in which all of the com- 
ponents of ^ = {di, . . . ,dK,fJ-)'^ are updated simultaneously. These authors 
showed that ^|A,y ~ N((^*,y) and gave formulas for ^* = ^*(A,y) and V = 
V{X,y). Because we will make extensive use of these formulas, they are re- 
stated in Appendix A. One iteration of the block Gibbs sampler consists of 
updating Xg, Ag and ^ in some order. Due to the conditional independence of 
Xg and Xe, the block Gibbs sampler is effectively a two- variable Gibbs sam- 
pler or data augmentation algorithm [Tanner and Wong (1987)], the two 
components being ^ and A. We choose to update A first because, as we will 
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see later, updating the most complicated distribution last typically simpli- 
fies the calculations required to establish drift and minorization conditions. 
If we write a one-step transition as (A',^') -^ (A,^), then the corresponding 
MTD is given by 

MA,C|A',n = /(A|e',y)/(e|A,y) 

= f{Me,y)f{Xe\e,y)fme,Xe,y). 

Hobert and Geyer (1998) considered the opposite update order because they 
were not attempting to simultaneously establish drift and minorization. 
Note, however, that our update order is just a cyclic permutation of the 
order used by Hobert and Geyer. 

A proper formulation of the burn-in problem requires some concepts and 
notation from Markov chain theory. These are provided in the following 
section. More general accounts of this material can be found in Nummelin 
(1984), Meyn and Tweedie (1993) and Tierney (1994). 

3. Markov chain background. Let X CW for p>l and let B denote the 
associated Borel cr-algebra. Suppose that X = {Xi, i = 0, 1, . . . } is a discrete 
time, time homogeneous Markov chain with state space X and Markov tran- 
sition kernel P; that is, for x G A' and AgB, P{x, A) = Pr(Xj+i £ A\Xi = x). 
Also, for re = 1,2,3,..., let P" denote the re-step transition kernel, that 
is, P"(x,^) = Pr(Xj+„ G A\Xi = x) so, in particular, P = P^. Note that 
P"(x,-) is the probability measure of the random variable Xn conditional 
on starting the chain at Xq = x. 

Let i^ be a measure on B. We will say that the Markov chain X satisfies 
assumption (A) if it is i/-irreducible, aperiodic and positive Harris recurrent 
with invariant probability measure vr(-). It is straightforward to show that 
the Gibbs samplers described in the previous section satisfy assumption (A) 
with v equal to Lebesgue measure. Under assumption (A), for every x £ X 
we have 

||P"(x,-)-vr(-)|UO asre^oo, 

where ||P"(x,-) — 7r(-)|| := sup^gg |P"(2;, A) — tt{A)\ is the total variation 
distance between P" and vr. The chain X is called geometrically ergodic if 
it satisfies assumption (A) and, in addition, there exist a constant < t < 1 
and a function 5 : A" i— > [0, 00) such that, for any x £ X, 

(6) ||P"(x,.)-^(-)ll<5(x)t" 

for re = 1, 2, It has recently been demonstrated that establishing drift and 

minorization conditions for X verifies geometric ergodicity (the existence 
of g and t) and yields an upper bound on the right-hand side of (6). See 
Jones and Hobert (2001) for an expository look at this theory. In this paper, 
we will focus on the results due to Rosenthal (1995a) and Roberts and Tweedie 
(1999). Slightly simplified versions of these results follow. 
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Theorem 3.1 [Rosenthal (1995a)]. Let X be a Markov chain satisfying 
assumption (A). Suppose X satisfies the following drift condition. For some 
function V :X ^^ [0, oo), some < 7 < 1 and some b < 00, 

(7) E[V{Xi+i)\X,=x]<^Vix)+b yx£X. 

Let C = {x £ X:V{x) < d^}, where d^ > 26/(1 — 7) and suppose that X 
satisfies the following minorization condition. For some probability measure 
Q on B and some e > 0, 

(8) P{x,-)>eQ{-) VxgC. 
Let Xq = xq and define two constants as follows: 

1+dR 



a ■■ 



1 + 26 + -fd-n 
Then, for any < r < 1, 



\P-{xo,-)-^{-)\\<{l-eY^ + 



and U = l + 2{-fdR + b). 



jjr 



a 



l~r 



1 + 



1-7 



+ V{xo)]. 



Theorem 3.2 [Roberts and Tweedie (1999, 2001)]. Let X be a Markov 
chain satisfying assumption (A). Suppose X satisfies the following drift con- 
dition. For some function W :X >-^ [1, od), some < p <1 and some L < co, 

(9) E[W{Xi+i)\Xi = x]<pW{x)+LIsix) VxGA-, 

where S = {x £ X : W{x) < d^n} o,nd 

L 



dRT > 



1-p 



1. 



Suppose further that X satisfies the following minorization condition. For 
some probability measure Q on B and some e > 0, 

(10) P{x,-)>eQ{-) Vxg5. 

Let Xq = xq and define some constants as follows: 



■■P + 



L 



J 



(kc^rt - e)(l + c^rt) + LduT 



I + ^rt' " (l + dRT)K 

log[(l/2)(L/(l -p)+ wixo))] log[(l - e)-'J] 



/3rt = exp 



log(K-i) 
logKlog(l — e) 



r]. 



log(Ac-i) 



_logJ-log(l-e)_ 
Then if J >1 and n' = k — (,> r/(l — e)/e, we have, for any 1 < /3 < /3rT; 



(11) ||p'=(xo,-)-vr(-)||< 



1 



[l + rj/n'Y/^i 



1 + 



n 



V 



1 + 



1 
n' 



n' /r) 



/?-". 
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Remark 3.1. The version of Theorem 3.2 in Roberts and Tweedie (1999) 
reUes on their Theorem 5.2, whose proof contains an error. Using Roberts 
and Tweedie's (1999) notation, suppose 1/ : ^ i— > [1, oo), d> 0, C = {x £ X : 
V{x) < d} and h{x,y) = {V{x) + V{y))/2. Roberts and Tweedie (1999) claim 
that 

H^^y) > {'^ + d)i[cxcri^^y)^ 

which is false and, in fact, all that we can claim is that 

1 + d 

h{x,y) > -^—IicxC]<x,y). 

We have accounted for this error in our statement of Theorem 3.2 and we 
are grateful to an anonymous referee for bringing the error to our attention. 

Remark 3.2. Roberts and Tweedie (1999) provide a different bound for 
the case J < 1 but, since we do not use it in our application (see Section 6), 
it is not stated here. 

Remark 3.3. Roberts and Tweedie (1999) show that the right-hand 
side of (11) is approximately minimized when /3 = /3rt/(1 + rj/n')^'^ . 

Remark 3.4. It is well known [see, e.g., Meyn and Tweedie (1993), 
Chapter 15] that (7) and (8) together [or (9) and (10) together] imply that 
X is geometrically ergodic. See Jones and Robert (2001) for an heuristic 
explanation. 

In our experience it is often easier to establish a Rosenthal-type drift 
condition than a Roberts-and-Tweedie-type drift condition. The following 
new result provides a useful connection between these two versions of drift. 

Lemma 3.1. Let X be a Markov chain satisfying assumption (A). Sup- 
pose there exist V :X h^ [0, oo), 7 G (0, 1) and b < 00 such that 

(12) E[V{Xn+i)\Xn = x]<^Vix)+b VxeA". 
Set W{x) = 1 + V{x). Then, for any a> 0, 

(13) E[W{Xn+i)\Xn = x]<pW{x) + LIc{x) yxeX, 
where p= {a + ^)/{a + 1), L = 6+ (1 — 7) and 
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Proof. Clearly, (12) implies that 

E[W{Xi+i)\Xi = x]< jW{x) + 6 + (1 - 7) = jW{x) + L VxeX. 
Set AW{x) = E[W{Xn+i)\Xn = x] - W{x) and /5 = (1 - j)/{a + 1). Then 

E[W{Xr,+i)\Xn = x] < [1 - (a + l)(3]W{x) + L 
or, equivalently, 

AW{x) < -l3W{x) - apW{x) + L 
for all xeX.Ii X ^C, then 



(a+2)L (a+2)L^^ 
^ ^ a(l-p) a(l-7) a/3' 
_ _i_ ^f^\ 



Now write VF(x) = Tg + s(x), where s(a;) > 0. Then 



AW{x) < -pW{x) - a(3 



^ + ^("). 



+ L 



= -pWix) - af3six) 
< -(3W{x). 
If, on the other hand, x £ C, then 

AW{x) < -PW{x) - a(3W{x) + L 
<-pW{x) + L. 
Now putting these together gives 

E[W{Xn+l)\Xr^ = x] < {1 - (3)W{x) + Lie 

= pW{x) + LIc. □ 



Remark 3.5. Since 



(a + l)L^_^_^ 



a{l-p) 1-p 

(13) constitutes a drift condition of the form (9). Therefore, if we can es- 
tablish (12) as well as a minorization condition on the set C, it will be as 
straightforward to apply Theorem 3.2 as it is to apply Theorem 3.1. Indeed, 
this is the approach we take with our Gibbs samplers. Moreover, we use 
a = 1 in our application since ^(i_ \ is minimized at this value. 

While the Gibbs sampler is easier to implement than the block Gibbs 
sampler, it is actually harder to analyze because it is effectively a three- 
variable Gibbs sampler as opposed to the block Gibbs sampler, which is 
effectively a two-variable Gibbs sampler. Thus, we begin with block Gibbs. 
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4. Drift and minorization for the block Gibbs sampler. Drift conditions 
of the form (7) are estabhshed for the unbalanced and balanced cases in 
Sections 4.1 and 4.2, respectively. A minorization condition that works for 
both cases is established in Section 4.3. Throughout this section we assume 
that m' = min{mi,m2, . ■ . jmR} ^ 2 and that K >3. 

4.1. Drift: unbalanced case. Define two constants as follows: 

and 62 



2ai + K-2 2a2 + M-2 

Also define ^3 = {K + 1)62 and ^4 = (^2 X]j=i "^-7 • ^^^ assumptions about 
K and m' guarantee that < c^j < 1 for i = 1,2,3,4. Set 6 = max{5i,(53}. 
Also, let A denote the length of the convex hull of the set {yi,y2, ■ ■ ■ ,yK,^TT'o} 
and define 

26i , 262 + SSE 

ci = — and C2 



2ai + K - 2 2a2 + M - 2 

Proposition 4.1. Fix 7 G {S,l) and let (j)i and (f)2 be positive numbers 
such that ^-^ + (5 < 7. Define the drift function as Vi{d,fi) = (j)ivi{d,fi) + 
(j)2V2{(^), where vi{9,fi) and V2{9) are as defined in Section 2. Then the block 
Gibbs sampler satisfies (7) with 



K 

ci + C2^mri + A"A^ 



Proof. It suffices to show that 



+ Mc2{K+l) + MA'^ 



(14) E[Vi{e, fi)\\', 9', fi'] < cl)i6ivi{9', 12') + (^^ + 63^ (l^2V2{9') + b 
because 

cp,6ivi{9',fi') + (^^ + Ssy2V2{9') + b 

< (f>iSvi {9',f,')+(^ + 6] <I)2V2 {9') + b 
<j4>iVi{9',fi')+j(t>2V2{0') + b 

= -fVi{e',fi') + b. 

In bounding the left-hand side of (14), we will use the following rule: 

(15) E[Vi{e,^L)\X',9',^/]=E[Vi{9,f,)\e',^/] = E{E[Vi{9,f,)\X]\9',f,'}, 
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which fohows from the form of the MTD for the block Gibbs sampler given 
in (5). We begin with some preliminary calculations. First, note that 



E{x^'\e',fi') = - — -i^- + 



(16) V y I 'r~. 2ai + K-2 2ai + K-2 

= ci + 6ivi{9' , iJ,') 
and 



-im/ /^_ 262+SSE , v^iO') 



E{X-'\9',f^') = -^—-- + 



(17) ' ^ ' "^ ' 2a2 + M -2 2a2 + M -2 

= C2 +S2V2{0'). 

We now begin the main calculation. Using our rule, we have 

K 

i=l 



K 



■■ElY^E[{e,-^^f\\] 



4 = 1 



?',/x' 



Using results from Appendix A, we have 

E[{e,-^lf\\] 

= Var(ei|A) + Var(^IA) - 2Cov[(^„^)|A] + [E{ei\\) - E{^i\\)f 

_ 1 A^ + (Ae+m^Ae)^ -2Ae(Ae + mjAe) 

\0 + mi\e {so + t){\e + rriiXeY 

+ [E{e,\\)-E{i,\\)f 



Xe+rriiXe {so + t){Xe + mi\e) 

1 TfijAp . 9 

< \ —^ h A^. 

rUiXe t{Xg+miXe) 

Hence, 

K K 

(18) Y.Em - /^)'| A] < A,"^ Y. ^i^ + V + K/\\ 

i=l i=l 

Thus, by combining (16)^(18) we obtain 

E[(t)iv^{e,ii)\e\ij.'\ 



(19) 



< (5i0iui(6i', n') + 5i4>iV2{0') + (/>! 



K 



ci + C2 ^ m/ + K/S^ 



i=l 
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Now 



ii;b2(^)|e',/i'] =5]m,ii;[(0i -yi)>',/z'] =i? 5]m,i?[(^, -y,)'|A] 

i K i 

We can bound the innermost expectation as follows: 
Em - y,)'|A] = Var(0i|A) + [E{e,\\) - mf 



r,f^' 



Xg + niiXe (so + i) ( Ae + mj Ae)2 

1 Afl .9 

niiXe t{Xe + iriiXe) 



K 



Hence 

(20) Y. ^i^m - m?\X] <{K + l)A-i + MA2, 
and so by combining (17) and (20) we obtain 

(21) E[(j)2V2ie)\e',fM'] < 63^2V2i0') + Mi^ + 1)C2 + MA^]. 

Combining (19) and (21) yields (14). D 

Remark 4.1. The upper bound on the total variation distance that is 
the conclusion of Theorem 3.1 involves the starting value of the Markov 
chain, xq, only through V{xo). Moreover, given the way in which ^(xo) 
enters the formula, it is clear that the optimal starting value, in terms of 
minimizing the upper bound, is the starting value that minimizes V{xq). 
This starting value is also optimal for the application of Theorem 3.2. In 
Appendix B we show that the value of (^,/i) that minimizes Vi{6,fi) has 
components 

^ 4>i[T,f=i{'nyyj/{4>i + 4>2mj))/J2f=i{mj/{(t)i + 4>2mj))] + (t)2miyi 

(pi + (j)2mi 

While the conclusion of Proposition 4.1 certainly holds when the data are 
balanced, it is possible to do better in this case. Specifically, the proof of 
Proposition 4.1 uses the general bounds on [E{6i\X) — E{iJ,\X)]'^ and [E{6i\X) — yi\^ 
given in Appendix A. Much sharper bounds are possible by explicitly using 
the balancedness, and these lead to a better drift condition. 



12 
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4.2. Drift: balanced case. Now assume that ttij = m > 2 for alli= 1, . . . ,K 
and let 85 = K52 £ (0,1)- 

Proposition 4.2. Fix 7 G (5,1) and /ei (p be a positive number such 
that (j)65 + 5 < 7. Define the drift function as V2{0,fi) = <j)Vi{6,fi) + m~^V2{0). 
Then the block Gibbs sampler satisfies (7) with 

K 
b = (t)Ci + [{4)K + K + l)/m]c2 + max{(/), 1} ^maxKy - 7/^)^, (mo - mf}, 

i=l 

where y := K~^ Y.f=i Vi- 

Proof. When the data are balanced, 

MXeXe 



\q + mXe ' 
so that E{iJ.\X) = {ty + rrao-so)/(so + t)- Hence for alH = 1, . . . , K we have 



[E{6i\X)-yi 



Afl 



e_ fty_ + rnQSQ\ Xemyj 



l2 



< 



Xg + mXe V so + t ) Ag + mX 

Ag yp(j/ - Vi) + so("io - yj)"' ^ 
Ag + mAg 

Ae 
Ae + mAg 



Vi 



so + i 

' ^(j/-y»)^ + go("^o-i/»)' 

so + i 



where the last inequality is Jensen's. A similar argument shows that, for all 
i = l,...,i^, 



L, . . . , J. 1 , 



\E(Q,\X)-E{ii\X)f< 



mXp 



Xe + mXe 



SQ+t 



Therefore, 



(t>[E{ei\x)-E{,ji\x)f + [E{ei\x)-m? 

t{y - mY + SoiruQ -yif 



< max{(/), 1} 



SQ+t 



and hence 



K 



i=l 



Y^{4>\E{e,\x) - E{^,\x)f + [E{ei\x) - y,f} 

K 
< max{</>, 1} ^ max{(y - yif , {mo - yif}. 



4 = 1 
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To prove the result, it suffices to show that 

(22) E[V2ie,^i)\X',e',ii'] < (l)5ivi{e\ii') + (c/x^s + <53)m^^2(^') + b 

since 

cl)5ivi{e',ii') + (#5 + <53)m-iy2(^') + h 

< (P5vi{e', fj.') + (0^5 + 5)m-^V2{e') + b 

<'j4>vi{6',fi') + ^m~^V2{9') + h 

= ^V2{e',ii') + h. 

The remainder of the proof is nearly identical to the proof of Proposition 4.1 
and is therefore left to the reader. D 

Remark 4.2. This result is stated (without proof) in Jones and Hobert 
[(2001), Appendix A] and the statement contains an error. Specifically, h is 
stated incorrectly and should appear as above. 

4.3. Minorization. We now use a technique based on Rosenthal's (1995a) 
Lemma 6b to establish a minorization condition of the form (8) on the set 

Sb = {{e,fi):Vi{9,fi) <d} = {{6,fi):cl)ivi{9,fi) + ^2V2{0) < d}, 

for any d> 0. Since V2 of Proposition 4.2 is a special case of Vi, this minoriza- 
tion will also work for V2. First note that Sb is contained in Cb '■= C^i nC^j > 
where 

CB,={{e,fi):vii9,fi)<d/(bi} and Cb, = {(0,^) :t^2(e) < d/<^2}. 

Hence, it suffices to establish a minorization condition that holds on Cb- 
We will accomplish this by finding an e > and a density q{X,9,fi) on 
IR2_ X M^ X M such that 

A:(A, e, f,\X', 9', fi') > eq{X, 9, fi) V {9', ^') G Cb, 

where A;(A, 9, fi\X' , 9', /i') is the MTD for the block Gibbs sampler given in (5). 
We will require the following lemma, whose proof is given in Appendix C. 

Lemma 4.1. Let Gamma(a,/3;2;) denote the value of the Gamma(a,/3) 
density at the point x > 0. If a > 1, b > and c > are fixed, then, as a 
function of x. 



inf Gamma(a,6-|- /3/2;2;) 



r Gamma(a,6;a;), if x < x* , 



0</3<c / ' / \ Gamma(a,6 + c/2;x), if x > x* , 

where 

X = — log 1 + — 
c V 2o 
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Here is the minorization condition. 
Proposition 4.3. Let q{X,9,fi) be a density on Mi x R^ x M defined 



as 



q{X,e,n) 



hi{Xe 



where 



k.hi{Xg)d\e 



K 



/l2(Ae 



k+h2{\e)d\e 



hi{\e) 



for 



and 



Gamma ( h o-i , fei ; Ag 

Gammal — + ai, — + 61; Xq 



^i{K + 2ai) (,^ d 
; log 1 + 



/(e|A,2/), 

\e < Ag, 
^9 > An, 



h2{Xe) 



2bi^i 



:m SSE ^ \ 

Gamma ( — + 02, — — + 62; Ag I , 

-M ^2SSE + d 
Gammal — + 02, — h 02; Ag ) , 



Ae < A*, 
Ae > A*, 



for 



A! 



(/)2(M + 2a2) 



log 1 + 



d 



62(262 + SSE), 

S'ei £5 = [/jg /ii(Ae) (iA6i][/]g /i2(Ae) (iAe]. T/ien the Markov transition den- 
sity for the block Gibbs sampler satisfies the following minorization condi- 
tion: 

k{\, e, fi\x', e', /i') > SBQix, e, m) v {e', fi') g Cb. 



Proof. We use S, = {9,fi) and S,' = {0',fi') to simplify notation. If ^' G 
Cb, we have 

/(AelC',y)/(Ae|e',y)/(e|A,y) 

>/(e|A,y) inf [/(Ae|e,y)/(Ae|e,y)] 



>/(e|A,y) 
>/(^|A,y) 



inf f{Xe\^,y) 

t,€CB 

inf /(A,|e,y) 



inf /(Ae|e,y) 
inf /(Ae|e,y) 
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Thus we can take 



q{X,e,f,)^fm,y) 



inf f{Xe\^,y) 



inf /(Ae|e,y) 



eeCBa 



Two applications of Lemma 4.1 yield the result. D 

The drift and niinorization conditions given in Propositions 4.1-4.3 can 
be used in conjunction with either Theorem 3.1 or 3.2 to get a formula 
giving an upper bound on the total variation distance to stationarity for the 
block Gibbs sampler. One such formula is stated explicitly at the start of 
Section 6. 

5. Drift and minorization for the Gibbs sampler. In this section we 
develop drift and niinorization conditions for the Gibbs sampler. We con- 
tinue to assume that m' = ioain{mi,m2, ■ ■ ■ ,mK} > 2 and that K >3. Let 
m" = max{7ni , ?7i2 , • • • ,mj<-}. 

5.1. Drift. Recall that 6i = l/(2ai + K - 2) and define 
K^ + 2Kai 



and 6- 



7 ■ 



2sobi + K'^ + 2Kai 2(ai-l)' 

Clearly 6q G (0, 1). It is straightforward to show that if ai > 3/2, then 6j £ 
(0, 1) and there exists pi G (0, 1) such that 

(23) (K+^^yi<pi. 

Define the function v^iO, A) = j^j^iO - yf. Also, let s^ = Etiim " V? ■ 
We will require the following lemma, whose proof is given in Appendix D. 

Lemma 5.1. Let a and b be constants such that 5b> a>b> 0. Then if 
X and y are positive, 

(24) f^^y+fr^V<i- 

\ax + yj \bx + yj 
Here is the drift condition. 

Proposition 5.1. Assume that ai > 3/2 and let pi G (0, 1) satisfy (23). 
Assume also that 5m' > m" . Fixc^ G (0,min{6i,&2}) and fix ^£ (max{pi,56)'^7}) 1) 
Define the drift function as 

V^{e, A) = e'=^^« + e'^^^^ + -4V + ^3(^, A). 

KdiXe 
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Then the Gibbs sampler satisfies (7) with 



ai+K/2 



61 -C3 

+ {^6 + S7) 



+ 



b2 



&2-C3 



a2+N/2 



1 S^ 

- + (mo-y) + — 



So 



+ 



2bi67 
K 



Proof. It suffices to show that 



(25) 



< 



5^[{K + 5^/5•J)5l 



KSiX' 



+ 



Sr 



m"\' 



hj] +^^[tj 



A'« 



,Xg + m"\'J ' ^''W'f^ + m'X', 
because, using Lemma 5.1 and (23), we have 



^^3(^',A') + 6, 



K5i\' 



K+^f^]6, 



+ 



S7 



m"Xi 



X' + m"K 



+ 6e 



X' 



X' + m'K 



V3ie',x') + b 



^ ^^ + max{<^6,<57}^3(^', A') + 6 



<jV3{e',x') + b. 

Recall that we are considering Hobert and Geyer's (1998) updating scheme 
for the Gibbs sampler: {ii',6',X') -^ (/i,0,A). Establishing (25) requires the 
calculation of several expectations, and these will be calculated using the 
following rule: 

Erne, x)\t^',o',x'] = Erne, x)\o',x'] 

= E{E{E[V3i9,X)\f^,0,e',X%L,e',X'}\9',X'} 

= E{E{Eme,X)\f,,9]\ti,X'}\0',X'}. 
We now establish (25). First, it is easy to show that 



(26) 



E[e^''^<>\9,fi]< 

E[e'"'^'^\9,i^i]< 
67 



n -C3 



&2-C3 



ai+K/2 



a2+N/2 



and 



Now we evaluate E[^^j^\fi',9' ,X']. Note that 



(27) 



E[X,'\fi,9]=6i 



K 



2bl + Y^{9,-^lf 



i=l 
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and 



(28) 



E[{ei - ^x)\i, A'] = Var(0i|^, A') + [E[e,\^i, A') - ^tf 



+ 



rriiX \ _ 2 

7 (^-yi) 



A^ + mjAg vAg + mjAe/ 
^ 1 / m"X'^ \\ ,2 



It follows that 

(29) j:Em-f^f\f^A']<^+(j7^^^) K{^^-yf + s\ 



A' 



A' + m"\' 



Letting 6' = K ^ 1]^ ^i, we have 



Eliix - yY\e', A'] = Var(/x|0', A') + [E{i^\e' , A') - y^ 



(30) 



< 



So + A' A; 
1 



^ + 



'° 7("^0-y) + -^^(F-2/) 



l2 



W + 



SO + KX'q 
So 



so + KX'g 



So + -fs^A^ So + KX 
1 



T{mo-y? + ^;^{W-yf 



so + i^A^ 



<- + (mo-y)2 + r;3(e',A' 
So 



where the first inequality is Jensen's. On combining (27)-(30), we have 



E 



(31) 



r 5i 

_K5iXg 


fi',e',x' 


67 

< — + 
+ 57 



Sv + ^^(v^)"3('''.A') 



h(mo-y) + — 

So A 



+ 



2biS7 
K ' 



The last thing we need to evaluate is £'[^3(6', A)|//', 9', A']. As in Hobert and Geyer 
(1998), Jensen's inequality yields 



(32) E 



KX, 



So + KXe 



1^,0 < 



KE{Xe\^i,( 



< 



K^ + 2Kai 



so + KE{Xg\n, 6) - 2sobi + K'^ + 2Kai 



Sa. 



These authors also note that the conditional independence of the 6*4 's implies 
that 



I \ ATI ^ ^-^Xeii + rriiXeyi 1 v;^ 



1 



K f^^ Xg + niiXe ' K"^ f-^Xe + rriiXe^ 



18 G. L. JONES AND J. P. HOBERT 

from which it fohows that 

E[{e - yf\^l, A'] = Ym{e\^l, \') + [E{e\ii, a') - yf 



1 ^ 1 



K'^^-^X'e + m^X',^ 



1^ A' 



KY.^r^y-y^) 



(33) 1 1 iJ^/ A' 



< 



E TT-^ (^-^-.) 



2 



^2 



K X'g + m'A^ ^ ^ V A'e + rrijAg 

^^ + (A^Tfe;;j(^-^") +^' 

where, again, (part of) the first inequahty is Jensen's. On combining (30), 
(32) and (33), we have 



Combining (26), (31) and (34) yields (25). □ 



1 s2 

h(mo-y) + — 

So K 



Remark 5.1. Note that our drift condition for the block Gibbs sam- 
pler (Proposition 4.1) holds for all hyperparameter configurations (corre- 
sponding to proper priors) and nearly all values of m' and m" . In con- 
trast, it is assumed in Proposition 5.1 that oi > 3/2 and that 5m' > m" . 
On the other hand, Hobert and Geyer's (1998) drift condition for the Gibbs 
sampler involves even more restrictive assumptions about ai and the re- 
lationship between m' and m" . Specifically, Hobert and Geyer (1998) as- 
sume that oi > {2,K - 2)/{2K - 2) and that m' > (\/5 - 2)m" . Note that 
(3K - 2)/(2K - 2) > 3/2 for ah iiT > 2 and that 5 > (^5 - 2)~^ w 4.23. 

Remark 5.2. In this case the optimal starting value minimizes 

V,{e,X) = e^^'^ + e^^'^ + -^+ ^^' {9-yf. 

KoiXe So + li^Xe 

The last term will vanish as long as the 0j's are such that 6 = y. The optimal 
starting value for Xg is the minimizer of the function e"^^ ® +67/{K6iXg). This 
cannot be computed in closed form, but is easily found numerically. Finally, 
since Ag = is not appropriate, we simply start Ae at a small positive number. 

5.2. Minorization. Fix d> and define So = {{G, A) : V3{6, A) <d}. Sim- 
ilar to our previous work with the block Gibbs sampler, our goal will be to 
find a density q{fi,9,X) on M x R^ x M^ and an e > such that 

k{fi, 9, A|/i', A', 9') > eqi^i, 9, A) V {9', A') G 5^. 
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As before, we will actually establish the minorization on a superset of Sq 
with which it is more convenient to work. Let C4 = 5j/{K5id) and put q 

and Cu equal to ^ — v (mo — y)^ + d and y + v (mo — y)"^ + d, respectively. 
We show in Appendix E that 5^ C Co = Cci H (7^2 H Ccg , where 

Cg^ = [{0, A) :C4 < Ae < ^|, Cg, = |(e, A) :0 < Ae < i^j, 



Cgs 



So + KXg 



Also, Cgi n Cg2 is nonempty as long as dlogd > {c^5i)/{K5i). We will 
require the following obvious lemma. 

Lemma 5.2. Let N(t, cr^;x) denote the value of the N(t, cr^) density at 
the point x. If a<b, then, as a function of x, 

inf N(r,a2;x) = /^(^'^»' ^/^<(« + ^)A 



a<T<b " ' ' ' \N(a,cr^;x 
Here is the minorization condition. 



ifx> (a + 6)/2. 



Proposition 5.2. Let q{i^i,9,X) be a density on R x 



t>K 



as follows: 



q{lM,9,\) 



gi{ti,0)g2{n) 



kkKgi{f^,0)g2{^i)d9dfi 



/(A|/x,0,y), 



where 



91(1^,0) =(^y^\wl-^-^j:m-^^?+m,{ 



defined 



and 



Set 



52 (m) 



£G 



^{cu,[so + Klog{d)/c3] ^/i), fi<y, 
N{ci,[so + Klog{d)/c3]-^;fi), fi>y. 

So + Kci 



so + Klog{d)/c3 



1/2 



gi{i^,0)g2{ij)d6d^i 



Then the Markov transition density for the Gihhs sampler satisfies the mi- 
norization condition 



k{f,, 6, A|^', e', A') > ecqi^t, 9, A) V (9', A') G Cg- 
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Proof. Recall that k{^l, 9, X\fi', 9', A') = f{f,\9', A', y)fi9\fi, A', y)/(A|/i, 9, y). 
For {9', X')£Cg, we have 

f{fi\9',X',y)f{9\i^,X',y) 

>^^^mi_^J{f,\9',X',y)f{9\i,,X',y) 



> 



> 



inf f(n\9',X',y) 
inf f{n\9',X',y) 



^,inf^^/(%.A',,) 



^cSU/CI"-^'") 



Using the fact that the 0j's are conditionally independent, we have 

= v.oifnc„. n /(^.If ■ *'■ !-) ^ ft „,|,f„,„^ /(^-lA". A'. !/)■ 

Now, using Jensen's inequality again, we have 

fiO^\^i,X',y) 

X'g+rriiXW X'gfi + TJiiX'^iJi^ '^ 



Afl + rriiX' 



2tt 



■exp< 



X'n + rriiX' 



' Xn + niiX' 



2ir 



X exp< 



An +mjA' •" 



A'« 



Ag + miX'f. 






Ag + mjA^ 



> 



' Xn + rriiX' 



2ir 



X exp 



Aji + rriiX' 



X'g + TTljAe * X'g + TTljA'e 



Hence, 



^»±^exp{-i|Aa«. - rt= + miA:{ft - y.f] }^ 



.JiUJ^'^'-''''^ 



> ( -^ 1 exp-^ 



27ry 



K 
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d 



Now, if {9', A') G Cg, then C4 < \'e < -^ and hence 



/Z7\2~ 



^ / gp + i^C4 so + {Klogd)/c3 

-^ so + {Klogd)/c3\ 27r 

/ so + (A'logd)/c3/ somo + i^A^F^2~ 
xexp< ^ — — [fi- 



so + KX' 



Thus, 



inf f{n\e',X',y) 



^ / so + Kc4 . so + {Klogd)/c3 

- V So + (A'log(i)/c3 {e',y)eCG V 27r 

/ so + (A'logd)/c3/ somo + KX'gW-^^^ 
" ^^n ^ r - .o + AA' 



^ / So + Kca . / so + (Klog(J)/c3 

- Y So + (A log (i)/c3 (e',A™GCG3 V 27r 

/ so + (i^logd)/c3/ somo + i^A^^^2> 






so + (A:iogd)/c3' 
where the last inequality is an application of Lemma 5.2. □ 

Remark 5.3. In Appendix F we give a closed form expression for eq 
involving the standard normal cumulative distribution function. 

6. A numerical example. Consider a balanced data situation and let 
TThi') denote the probability measure corresponding to the posterior density 
in (2). Let P"((Ao,'^o)) ■) denote the n-step Markov transition kernel for the 
block Gibbs sampler started at (Ao,^o). [Equation (5) shows that a starting 
value for Aq is actually not required.] We now write down an explicit upper 
bound for 

r"((Ao,^o),-)-^h(-)ll, 
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based on Theorem 3.1 and Propositions 4.2 and 4.3. Although it has been 
suppressed in the notation, both vr/^ and P" depend heavily on the six hy- 
perparameters, ai, 6i, 02, 62, so and niQ. Our upper bound holds for all 
hyperparameter configurations such that ai, &i, 02,62, sq are positive, that 
is, all hyperparameter configurations such that the priors on Xq, Ag and fi 
are proper. Due to its generality, the bound is complicated to state. First, 

Ljivij -y^)^ 

that 



recall that SSE = J2i jiUij ~ yi)^ ^ where i/i = m ^J2T=iyij- Recall further 



61 



1 



2ai + K-2' 



82 



2a2 + M - 2 ' 



^3 = (i^ + l)<52, S^ = K62, 6 = max{6i,63}, ci = 2bi6i and C2 = (262 + SSE)52. 
Note that all of these quantities depend only on the data and the hyperpa- 
rameters. 

Now choose 7 G {S,l) and (/> > such that (P65 + 6 <^. Also, let 

K 

b = (j)Ci + [{(l)K + K + l)/m]c2 + max{0, 1} ^max{(^ - yif, {mo - yif}, 

and choose d-^ > 26/(1 — 7). Finally, let 



i=l 



SB = / hi{\e)d\e / /i2(Ae)dAe 



where 



hi{\e 



for 



and 



fK \ 

Gamma ( h ai , 61 ; Ag 1 , 

Gamma | h ai , — - + Oi ; A^ 

2 2(j) 



4>{K + 2ai) ( dn 
■ log 1 + 



Xg < A^, 

Afl > Afl, 



dn 



2b^ 



Gamma 



/l2(Ae 



SSE ^ ^ 

1 2 +"2,^- + 02;Ae 



[M SSE + mdR , ^ 
Gamma I — + 02, h 62; A( 



Ag < Ag, 
-^e > Ag, 



for 



(M + 2a2) 
mdR 



log 1 + 



mdR 



252 + SSE 
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Table 1 
Simulated data 



Cell 


1 


2 


3 


4 


5 


Vi 


-0.80247 


-1.0014 


-0.69090 


-1.1413 


-1.0125 



M = mK = 50 
y = M-' Ell Ej=i y^. = -0.92973 

ssE = Eti E]=i (y^j ~y^? = 32.990 



Note that e-q cannot be calculated in closed form, but can be evaluated 
numerically with four calls to a routine that evaluates the incomplete gamma 
function. Recall from the statement of Theorem 3.1 that 

1 + 2& + 7dR 
Here is the bound. For any < r < 1 and any n G {1, 2,3,...}, 

ll^"((Ao,eo),-)-vr,(-)|| 

< (1 - sb)™ + \^^p;) (i + Y^ + 'A«i(^o, m) + m-^vim 

Using the optimal starting values from Remark 4.1, this becomes 
ll^"((Ao,C*)r)-vr,(-)|| 

Explicit upper bounds can also be written for the block Gibbs sampler in the 
unbalanced case and for the Gibbs sampler. These are similar and are left to 
the reader. It is interesting to note that because our drift and minorization 
conditions for the block Gibbs sampler are free of sQ) so too is the bound 
in (35). 

To evaluate (35), the user must provide values for 7, c^, d^ and r. In our 
experience, small changes in these quantities can lead to dramatically differ- 
ent results. Unfortunately, the right-hand side of (35) is a very complicated 
function of 7, (^, d^ and r. Hence, it would be quite difficult to find "opti- 
mal" values. In our applications of (35), we simply define reasonable ranges 
for these four quantities and then perform a grid search to find the configu- 
ration that leads to the smallest upper bound. We now provide an example 
of the use of (35) and of the analogous bound based on Theorem 3.2. 

The data in Table 1 were simulated according to the model defined in 
Section 2 with K = 5, m = 10, oi = 2.5, 02 = foi = 62 = 1) ^0 = and sq = 1. 
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We now pretend that the origin of the data is unknown and consider using 
the block Gibbs sampler to make approximate draws from four different 
intractable posterior distributions corresponding to the four hyperparameter 
settings listed in Table 2. The first setting in Table 2 is the "correct" prior 
in that it is exactly the setting under which the data were simulated. As one 
moves from setting 2 to setting 4, the prior variances on \g and Ag become 
larger; that is, the priors become more "diffuse." For reasons discussed below 
mo is set equal to y in settings 2-4. 

For each of the hyperparameter settings in Table 2 we used (35) as well 
as the analogous bound based on Theorem 3.2 to find an n* such that 

ll^"*((Ao,C*),-)-^/.(-)ll<0.01. 

The results are given in Tables 3 and 4. For example, consider hyperparam- 
eter setting 2. Theorem 3.1 yields 

ll^''''((Ao,C*)>-)-vr,(-)||<0.00999, 

while Theorem 3.2 yields 

ll^''''((Ao,C*)r)-vr,(-)||<0.00999. 

While examining the n*'s in Tables 3 and 4, keep in mind that it takes about 
1.5 minutes to run one million iterations of the block Gibbs sampler on a 
standard PC. Thus, even the larger n*'s are feasible. 

Note that the results based on Theorem 3.1 are better across the board 
than those based on Theorem 3.2. We suspect that our use of Lemma 3.1 in 
the application of Theorem 3.2 has somewhat (artificially) inflated the n*'s 
in Table 4. 

A comparison of the n*'s for hyperparameter settings 1 and 2 (in either 
table) shows that our bound is extremely sensitive to the distance between 
mo and y. This is due to the fact that ee decreases rapidly as b increases 
and b contains the term X]i=i niax{(y — y^)^, (mo — yiY}-, which is minimized 
when rriQ = y. While there may actually be some difference in the conver- 
gence rates of the two Markov chains corresponding to settings 1 and 2, it 

Table 2 
Four different prior specifications 



Hyperparameter 












setting 


ai 


6i 


a2 


62 


mo 


1 


2.5 


1 


1 


1 





2 


2.5 


1 


1 


1 


y 


3 


0.1 


0.1 


0.1 


0.1 


y 


4 


0.01 


0.01 


0.01 


0.01 


y 
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Table 3 
Total variation bounds for the block Gibbs sampler via Theorem 3.1 



Hyperparameter 














setting 


7 


4> 


dR 


r 


SB 


n* Bound 


1 


0.2596 


0.9423 


15.997 


0.0188 


3.1x10"'' 


7.94 X 10® 0.00999 


2 


0.2596 


0.5385 


3.0079 


0.0789 


0.0171 


3.415x10^ 0.00999 


3 


0.4183 


0.3059 


2.8351 


0.0512 


6.8x10"* 


1.315 X 10^ 0.00999 


4 


0.4340 


0.2965 


2.8039 


0.0483 


8.1x10"'' 


1.1796x10^ 0.00999 



seems unlikely that the difference is as large as these numbers suggest. (Re- 
member, these are only sufficient burn- ins.) It is probably the case that our 
results simply produce a better bound under setting 2 than they do under 
setting 1. This issue is discussed further in Section 7. 

Another noteworthy feature of Tables 3 and 4 is that n* increases as the 
priors become more "diffuse." Figure 1 contains two plots describing the 
relationship between the prior variances on Xg and Ae and n* . [The n*'s in 
this plot were calculated using (35).] Note that n* increases quite rapidly 
with the prior variance on Xg. While it is tempting to conclude that the 
chains associated with "diffuse" priors are relatively slow to converge, we 
cannot be sure that this is the case because, again, these are only sufficient 
burn-ins. However, our findings are entirely consistent with the work of 
Natarajan and McCulloch (1998), whose empirical results suggest that the 
mixing rate of the Gibbs sampler (for a probit-normal hierarchical model) 
becomes much slower as the priors become more diffuse. 

7. Discussion. The quality of the upper bounds produced using Theo- 
rems 3.1 and 3.2 depends not only on the sharpness of the inequalities used 
to prove the theorems themselves, but also on the quality of the drift and mi- 
norization conditions used in the particular application. Consequently, it is 
possible, and perhaps even likely, that the chains we have analyzed actually 
get within 0.01 of stationarity much sooner than the n*'s in Tables 3 and 4 

Table 4 
Total variation bounds for the block Gibbs sampler via Theorem 3.2 

Hyperparameter 

setting p 4> dRT eb n* Bound 



1 


0.615 


0.84 


15.213 


4.1x10"^ 


1.8835x10^ 


0.00999 


2 


0.5975 


0.49 


2.6564 


0.0234 


6.563 X 10^ 


0.00999 


3 


0.7113 


0.3181 


2.8492 


7.2x10"* 


3.3915x10^ 


0.00999 


4 


0.7191 


0.3084 


2.8154 


8.6x10"'' 


2.966 X 10^ 


0.00999 
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would suggest. For example, we know from Table 3 that a sufficient burn-in 
for hyperparameter setting 2 is 3415. Thus, the value 6563 from Table 4 is 
too large by at least a factor of 1.9. The question then becomes how conser- 
vative are the results based on Rosenthal's theorem? As we now explain, this 
question was addressed by van Dyk and Meng (2001) in a different context. 
Hobert (2001) used Theorem 3.1 to calculate a sufficient burn-in for 
a Markov chain Monte Carlo (MCMC) algorithm developed in Meng and van Dyk 
(1999). In the Rejoinder of van Dyk and Meng (2001) an empirical estima- 
tor of the total variation distance to stationarity was developed and used 
to demonstrate that Hobert's upper bound is probably extremely conser- 





ai = bi 

Fig. 1. These two plots show how the ^^dijjuseness" of the priors on Xg and Ae affects 
n* . The top plot shows n* against a2 = 62 where the hyperparameters associated with Ae 
are held constant at a\=hi = 1. When 02 = &2, the prior variance of Ae is I/&2 and the 
prior mean is constant at 1. The bottom plot shows log(log(n*)) against ai = 61 where the 
hyperparameters associated with Ae are held constant at 02 = 62 = 1- When a\ = b\, the 
prior variance of \g is 1/bi and the prior mean is constant at 1. In all cases mo was set 
equal to y. 
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Table 5 
Simulated data 



Cell 


1 


2 


3 


Vi 


-0.54816 


0.92516 


-0.19924 



Mt = mK = 12 

y = M^' Eti E ■=! y^^ = 0.059253 

SSE = Ell E ■=i(y«. - yO' = 20.285 



vative. Indeed, Hobert's sufficient burn-in was n* = 335 wliile van Dyk and 
Meng's simulation results suggested that a burn- in of 2 is sufficient. We 
have experimented with van Dyk and Meng's empirical techniques in our 
situation and have come to similar conclusions. It would be interesting to 
use a Markov chain whose convergence behavior is known exactly to study 
how the sharpness of the bounds produced by Theorems 3.1 and 3.2 changes 
when different drift and minorization conditions are used. 

In situations where it is possible to rigorously analyze two different MCMC 
algorithms for the same family of intractable posteriors, it is tempting to 
compare the algorithms using sufficient burn-in. However, we do not believe 
that this is an entirely fair method of comparison. Consider using our re- 
sults in this way to compare Gibbs and block Gibbs. As we mentioned above, 
our Gibbs sampler is more difficult to analyze than our block Gibbs sam- 
pler. This probably results in relatively lower quality drift and minorization 
conditions for the Gibbs sampler. Indeed, using Propositions 5.1 and 5.2 in 
conjunction with Theorem 3.1 almost always yields extremely large n*'s. 
Specifically, unless the priors are extremely "informative," it is difficult to 
find a hyperparameter configuration under which eq is not effectively 0. 
Here is a comparison. 

The data in Table 5 were simulated according to the model defined in 
Section 2 with K = 3, "m, = 4, ai = a2 = 6i = 62 = 2, sq = 1 and tjiq = 0. We 
use the informative hyperparameter setting: ai = 5, 02 = 2, 61 = 20, 62 = 20, 
uiQ = and sq = 4. For the block Gibbs sampler (35) yields 

ll^'''''((Ao,C')r)-^h(-)ll<0.00999. 

For the Gibbs sampler Propositions 5.1 and 5.2 in conjunction with Theo- 
rem 3.1 yield 

As starting values for the Gibbs sampler we used (^q^ , Ag^ ) = (y, y, y, W~^, 
0.2839) (see Remark 5.2). The constants used to construct these bounds are 
given in Table 6. 
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While it is probably the case that block Gibbs converges faster than Gibbs, 
it is unlikely that the true difference is anywhere near as large as these num- 
bers suggest. Thus, if we use these results to compare Gibbs and block Gibbs, 
the former will be penalized by the fact that it is simply more analytically 
cumbersome. 



APPENDIX A 

A.l. The elements of C and V. Hobert and Geyer [(1998), page 418] 
show that S,\X,y ~ N(^*,T/) and give the specific forms of ^* = C*{X,y) and 
V = V{X,y). We restate their results here. First we let 



then 



Finally, 



and 



K 



^ = E 






E{9i\X) 



Var(0i|A) = 

Cov(0,,0,|A) = 

Cov(0i,/x|A) = 

Var(/i|A) = 

E{^l\X) 

Xe 
Xg + rriiXe 



1 



Xe + TTljAe 



1 + 



A^ 



{Xe + miXe){sQ + t) 
XI 



[Xe + miXe){Xe + rrijXe) (sq + 1) ' 

{Xg + miXe){so+t)' 



So + t 



So+t 



K 



y-^ niiXeXeVi , 
~{Xe + rriiXe 



So + t 



K 



~{Xe+ rrijXe 



+ 



Xetriiyi 
Xe + miXe 



Table 6 
Constants used to construct total variation bounds 



Sampler 


7 


</> 


Pi 


C3 


dR 


r 


e 


Block Gibbs 
Gibbs 


0.3956 
0.41528 


0.3589 
na 


na 
0.41527 


na 
2.6667 


28.328 
26.010 


0.0111 
0.0009 


0.0246 
5.6x10-^^ 
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Observe that E{fj,\X) is a convex combination of yj and rriQ and, furthermore, 
E{6i\X) is a convex combination of E{ii\X) and yj. If we let A denote the 
length of the convex hull of the set {yi,y2, ■ ■ ■ ,yK,rno}, then for any i = 
1,2,..., K, [E{e,\X) - E{p\X)]^ < A2 and [^(0i|A) - mf < A^. 

APPENDIX B 

B.l. Optimal starting values. We desire the value of {9,fi) that mini- 
mizes 

K K 

Clearly, no matter what values are chosen for the 9i's, the minimizing value 
of fi is 9. Thus, we need to find the value of 9 that minimizes 



K K 

i=l i=l 



hJ2i^i-9f + cP2Y.MGi-yi 



Setting the derivative with respect to 9i equal to yields 

. 4)i9 + (t)2miyi 



(36) 



H + 4>2'mi 



Summing both sides over i and dividing by K yields an equation in 9 whose 
solution can be plugged back into (36) and this yields the optimal starting 
value 



</'iEj=i("ijyj/(</'i + hmj))/T,j=i{'m'j/{(t>i + <p2mj))] + (t)2miyi 



01 + (t)2mi 

APPENDIX C 
C.l. Proof of Lemma 4.1. Let 

Note that x* is the only positive solution to fdx) = fo{x). To prove the 
result it suffices to show that (i) Rq{(3) = /^(x)//o(x) > 1 for all x G (0,x*) 
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and all /3 G (0,c) and that (ii) Rc{(3) = fp{x)/ fc{x) > 1 for all x G (x*,oo) 
and all /? G (0, c). Fix A; > and define a function 

for u>0. Since h{0) = and h'{u) < 0, we know h{u) < for n > 0. Hence, 



(37) 



-— — ^\og{l + ku)<0 



for u > 0. Define another function, 



1 



g{u) = — log(l + ku) 



u 



for n > 0. Since the the left-hand side of (37) is equal to g'{u), we have 
established that g{u) is decreasing for u> 0. Thus, ii x < x* = ^ log(l + ^) 
and /3 £ (0,c), then 

P\ x(3 



\ogRo{(5)=alog[ 1 + 



26 



>«log(l + — 1-— log 



'-I 



:a/3 



^^-(-1 



^^°^(^+^ 



>0, 



and (i) is established. Case (ii) is similar. 

APPENDIX D 

D.l. Proof of Lemma 5.1. First, let g{v) = v + cv~^ , where c > and 
V > 0. It is easy to show that g is minimized at u = ^/c. Thus, 

2 / „, \ 2 

1 



y 



> 



> 



ax + y J \bx + y , 

2te^y^ [y/x + ab{x/y)] + x'^y'^{b'^ + 4ab - a?) 
{ax + yY{bx + yY 

26x2y2 [2^06] + x2y2(?,2 _^ 4^^ _ ^2^ 



(ax + yy(bx + y)'^ 



(ax -|-y)2(6x + ?/)2 

_ x'^y'^{5b-a){b + a) 
{ax + yY{bx + yY 

>0. 
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APPENDIX E 
E.l. SgCCg = Cg, n Cg, n Cg, . First, 
SG = {ie,X):Vsie,X)<d} 
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A) : e"''^" + e"'^^ + 



KSiXt 



+ V3{e,\)<d 



C |(0, A) : e'''^" < d, e'-'^^ < d, ^^ < d, vs{e, A) < dj 

= |(^,A):-^<A,<i^,0<A,<i^,.3(^,A)<4. 
L Kdid C3 C3 J 

As in the proof of Proposition 5.1, Jensen's inequality yields 



\ S0 + KXg ^) - so + KXe 



(mo-y) + „ , ... {G-y) 



So + KXe 



<{mo-yf+vs{6,X), 
and hence Sg is contained in 

Cc:=|(^,A):-^<A,<i^,0<Ae<i^, 
L A did cs C3 



somo + KXgd 
So + -fCAg 



yj <imo-yf + di. 



Let C4 = S7/{K6id) and put q and c„ equal to y — V (mo — y)^ + c? and 
2/ + V (mo — y)^ + d, respectively. Note that Cg = Cgi H Cg2 H Cg-^, where 



CGi = {(^,A):c4<A,<i^|, 

C7G, = |(^,A):0<Ae<i^}, 

Cgs = \ ip, >^>-ci< — - — —F7\ < c„ 



So + KXe 

APPENDIX F 
F.l. Closed form expression for sg. Recall that 

So + KC4 



£g 



so + Klog{d)/c3 



1/2 



9iil^,0)g2{fJ')dedi^i 



32 G. L. JONES AND J. P. HOBERT 

A straightforward calculation shows that 



(38) 



Thus, 



C4C3 
\ogd 



K/2K I - 



m 



■exp 



rrii log d 



+ mi { 2c3(l + mj 



-{fJ'-yif 



9i{fJ',0)92{fJ')d6dfi 



K/2 K 



logd) nyi 



+ mi 



K 



92(/i)n^^P') ~wrTrT-zr\it^-yif }dn. 



i=l 



2c3(l + mi) 



Now 



Define 



and put 



and 



K 



52(/.)exp|-ig^g^(;.-y,)^|d/. 



^so + Ji:iog(d)/c3 
27r 



' exp(-£°±^M^(^_,J^ 



"H l + rrn 



poo 

+ /_ exp 



X exp-i 



2C3 ^1 + 
50 + Klog((j)/C3 

2 

log d x-^ m 



(/« - cif 



2C3 



1 ' 



, ^°Sd( ^ "^i 



C3 



j=l 






mi=v 



m,, = V 



K 



I ^°^d y-^ 



i=l 



X 



yi"ii 



m,; 



logd f^ v^ 
c«so H Kcu + V 7 

C3 \ ^ 1 



i=l 
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Then 



exp 



K 



so + Klog{d)/c3 2I J logd,^ 



rrii 



ifJ- - Vif \ dli 



■ exp 



and 



exp 




i + Ei 



K -2 
yfrui 



+ mi 



2c3 ^ 1 + "^J 



exp 



c?so logfi 



{^--cif Uxp< 



loed 



Si 



m,; 



2C3 ^ 1 + "^^ 



it^-Vif \dn 



2 2c3 



Kcf + Ei 

i=i 



K -2 
yfrrii 



+ mi 



+ 



2u 



Putting all of this together yields 



^G = yv{so + KC4^) 



K 



Hr 



C4C3 



/</2 



exp 



+ exp 



\ IJ{ l + rrii \logd 
clso Kcllogd ml 



exp< 



logd 



K 



yfrui 



2C3 ^ 1 + "Ij 



2C3 



2v 



y-rriu 



cfsQ Kcflogd iTT-f \ ( ^ ^(y~^^i 



+ 



1_$ 



2 2c3 2v ) \ \ y/v 

where $(•) denotes the standard normal cumulative distribution function. 
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